Demo: Statistical Tests for Categorical Data (Part II)

Two dependent sample test

We first create data. In particular we create two categorical vectors with two categories:

set.seed(123)
x <- rbinom(n = 60, size = 1, prob = 0.3)
y <- rbinom(n = 60, size = 1, prob = 0.6)
mat <- table(x, y)
dimnames(mat) <- list(before = c("yes","no"),
                      after = c("yes","no"))
mat

##       after
## before yes no
##    yes  17 25
##    no    8 10

Let’s assume that we have paired observations. Null hypothesis: the probability of before = no and after = yes is equal to the probability of before = yes and after = no. We can use the McNemar test for paired data:

mcnemar.test(x = mat)

## 
##  McNemar's Chi-squared test with continuity correction
## 
## data:  mat
## McNemar's chi-squared = 7.7576, df = 1, p-value = 0.005349

Alternatively:

mcnemar.test(x = x, y = y)

## 
##  McNemar's Chi-squared test with continuity correction
## 
## data:  x and y
## McNemar's chi-squared = 7.7576, df = 1, p-value = 0.005349

By default the function assumes continuity correction. We can use the argument correct = FALSE to change that:

mcnemar.test(x = mat, correct = FALSE)

## 
##  McNemar's Chi-squared test
## 
## data:  mat
## McNemar's chi-squared = 8.7576, df = 1, p-value = 0.003083

Multiple testing

Performing many comparisons within one experiment increases the likelihood of obtaining at least one false-positive result with each additional test. We need to correct for multiple testing. This can be done using the function p.adjust(): Let’s assume that we have performed the following tests:

set.seed(123)
x <- rnorm(n = 300, mean = 10, sd = 5)
y <- rnorm(n = 300, mean = 11, sd = 2)
z <- rnorm(n = 300, mean = 15, sd = 3)
w <- rbinom(n = 300, size = 1, prob = 0.6)
u <- rbinom(n = 300, size = 1, prob = 0.4)

test1 <- t.test(x = x, y = y)
test2 <- t.test(x = x, y = z)
test3 <- t.test(x = y, y = z)
test4 <- mcnemar.test(x = w, y = u)

The corresponding p-values are then:

test1$p.value

## [1] 0.004475476

test2$p.value

## [1] 3.344518e-42

test3$p.value

## [1] 3.785251e-61

test4$p.value

## [1] 2.235955e-07

# Adjustment using the Bonferroni method:
p_adj_Bonf <- p.adjust(p = c(test1$p.value, test2$p.value, test3$p.value,
                         test4$p.value), method = 'bonferroni')
p_adj_Bonf

## [1] 1.790191e-02 1.337807e-41 1.514100e-60 8.943818e-07

# Adjustment using the Holm method
p_adj_Holm <- p.adjust(p = c(test1$p.value, test2$p.value, test3$p.value,
                         test4$p.value), method = "holm")
p_adj_Holm

## [1] 4.475476e-03 1.003355e-41 1.514100e-60 4.471909e-07

Demo: Statistical Tests for Categorical Data (Part II)

Eleni-Rosalina Andrinopoulou, Department of Biostatistics, Erasmus Medical Center

2021-09-10

Two dependent sample test

Multiple testing